re-enable `sort_query_fuzzer_runner` #16491

adriangb · 2025-06-20T20:25:46Z

Related to SortQueryFuzzer found a failing case on main #16452

Re-enable test, verify CI fails, might need to run a couple times?
Revert TopK dynamic filter pushdown attempt 2 #15770 (suspected cause).
Verify CI doesn't fail after multiple runs.

alamb · 2025-06-21T11:42:54Z

Context for anyone interested: #16452 (comment)

This reverts commit 6e83cf4.

adriangb · 2025-06-21T13:27:35Z

datafusion/common/Cargo.toml

@@ -55,6 +55,7 @@ apache-avro = { version = "0.17", default-features = false, features = [
 arrow = { workspace = true }
 arrow-ipc = { workspace = true }
 base64 = "0.22.1"
+chrono = { workspace = true }


This is temporary until the upstream bug gets fixed in arrow, plus it's necessarily already in the dependency tree because arrow uses it.

adriangb · 2025-06-21T13:29:35Z

I think with these fixes to Display<ScalarValue> the tests will pass consistently.

I used this script to test:

#!/usr/bin/env python3

import argparse
import subprocess
from concurrent.futures import ThreadPoolExecutor, as_completed
from threading import Event

def run_test(command, run_num, total_runs, stop_event):
    """Run a single test and return result"""
    if stop_event.is_set():
        return run_num, "SKIPPED", None
    
    try:
        result = subprocess.run(command, shell=True, capture_output=True, text=True)
        status = "PASS" if result.returncode == 0 else "FAIL"
        print(f"Run {run_num}/{total_runs}: {status}")
        return run_num, status, result
    except Exception as e:
        print(f"Run {run_num}/{total_runs}: ERROR - {e}")
        return run_num, "ERROR", None

def main():
    parser = argparse.ArgumentParser(description="Run a command multiple times and report failure rate")
    parser.add_argument("-P", "--parallel", type=int, default=1, help="Number of parallel jobs (default: 1)")
    parser.add_argument("-n", "--runs", type=int, default=100, help="Number of runs (default: 100)")
    parser.add_argument("-x", "--stop-on-failure", action="store_true", help="Stop at first failure")
    parser.add_argument("command", nargs=argparse.REMAINDER, help="Command to run")
    
    args = parser.parse_args()
    
    command = " ".join(args.command)
    print(f"Running command {args.runs} times with {args.parallel} parallel jobs...")
    print(f"Command: {command}")
    print("----------------------------------------")
    
    stop_event = Event()
    failures = 0
    completed_runs = 0
    failure_outputs = []
    
    with ThreadPoolExecutor(max_workers=args.parallel) as executor:
        # Submit all jobs
        futures = []
        for i in range(1, args.runs + 1):
            future = executor.submit(run_test, command, i, args.runs, stop_event)
            futures.append(future)
        
        # Process results as they complete
        for future in as_completed(futures):
            run_num, status, result = future.result()
            completed_runs += 1
            
            if status == "FAIL" or status == "ERROR":
                failures += 1
                if result and (result.stdout or result.stderr):
                    failure_outputs.append((run_num, result.stdout, result.stderr))
                if args.stop_on_failure:
                    print(f"Stopping at first failure (run {run_num})")
                    stop_event.set()
                    # Cancel remaining futures
                    for f in futures:
                        f.cancel()
                    break
    
    print("----------------------------------------")
    print("Results:")
    print(f"Total runs: {completed_runs}")
    print(f"Failures: {failures}")
    print(f"Passes: {completed_runs - failures}")
    if completed_runs > 0:
        failure_rate = (failures * 100) / completed_runs
        print(f"Failure rate: {failure_rate:.2f}%")
    else:
        print("Failure rate: 0%")
    
    # Print failure outputs
    if failure_outputs:
        print("\n" + "="*50)
        print("FAILURE OUTPUTS:")
        print("="*50)
        for run_num, stdout, stderr in failure_outputs:
            print(f"\n--- Run {run_num} ---")
            if stdout:
                print("STDOUT:")
                print(stdout)
            if stderr:
                print("STDERR:")
                print(stderr)

if __name__ == "__main__":
    main()

And was able to run with no errors:

./run-test.py -P 10 -n 600 -x cargo test --package datafusion --test fuzz -- fuzz_cases::sort_query_fuzz::sort_query_fuzzer_runner --exact --show-output

I'm running a 1200 run to confirm now.

adriangb · 2025-06-21T13:32:49Z

I understand why but I do find it kind of strange that Literal::new() calls Display on ScalarValue. I wonder if we could just make the Field name "lit"?

adriangb · 2025-06-21T13:57:43Z

For context the failures reported here and here are both related to this overflow error fixed in this PR. If someone has seen a different error for sort_query_fuzzer_runner since #16465 was merged please share it!

adriangb · 2025-06-21T13:58:56Z

@AdamGS @alamb @blaginin could you please review?

AdamGS · 2025-06-21T15:07:51Z

LGTM. IDK if there's a precedence to formatting the error case as an empty string elsewhere in Datafusion. It seems like format_option! uses "NULL" as sort of a display sentinel value, maybe values like this need their own placeholder?

alamb · 2025-06-22T11:16:44Z

LGTM. IDK if there's a precedence to formatting the error case as an empty string elsewhere in Datafusion. It seems like format_option! uses "NULL" as sort of a display sentinel value, maybe values like this need their own placeholder?

I think @adriangb also fixed this in

use 'lit' as the field name for literal values #16498

alamb

The code and fix looks good to me. Thank you for tracking this down @adriangb and @AdamGS

I am trying to verify that I can reproduce the error locally but so far I can't cause an error with main nor this branch. I'll report back if I am able to

Here is my reproducer (not as fancy as what you used)

set -e
for i in `seq 1 100` ; do 
  echo "*** Iteration $i "
  cargo test --test fuzz -- sort_query_fuzzer_runner &
  cargo test --test fuzz -- sort_query_fuzzer_runner &
  cargo test --test fuzz -- sort_query_fuzzer_runner &
  cargo test --test fuzz -- sort_query_fuzzer_runner &
  cargo test --test fuzz -- sort_query_fuzzer_runner &
  cargo test --test fuzz -- sort_query_fuzzer_runner &
  cargo test --test fuzz -- sort_query_fuzzer_runner &
  cargo test --test fuzz -- sort_query_fuzzer_runner &
  wait
done

adriangb · 2025-06-22T12:46:13Z

Should we go ahead and merge and get the test running again (or find out quickly with the many CI runs it's still broken)? Or do you want to wait for your local testing?

alamb · 2025-06-22T12:58:47Z

Should we go ahead and merge and get the test running again (or find out quickly with the many CI runs it's still broken)? Or do you want to wait for your local testing?

I think we should merge it

alamb · 2025-06-22T12:58:53Z

Thank you @adriangb

revert

2ff68f0

github-actions bot added the core Core DataFusion crate label Jun 20, 2025

Revert "Dynamic filter pushdown for TopK sorts (apache#15770)"

6e83cf4

alamb marked this pull request as draft June 21, 2025 11:42

adriangb added 3 commits June 21, 2025 07:46

fix ScalarValue Display impl for Date64

5134a72

handle another case

a5d646b

Revert "Revert "Dynamic filter pushdown for TopK sorts (apache#15770)""

c2f4954

This reverts commit 6e83cf4.

github-actions bot removed documentation Improvements or additions to documentation optimizer Optimizer rules sqllogictest SQL Logic Tests (.slt) datasource Changes to the datasource crate physical-plan Changes to the physical-plan crate labels Jun 21, 2025

adriangb added 2 commits June 21, 2025 08:26

move test

6e5f037

move test

d0d4689

github-actions bot removed the physical-expr Changes to the physical-expr crates label Jun 21, 2025

fmt

d0cbfd4

adriangb commented Jun 21, 2025

View reviewed changes

adriangb marked this pull request as ready for review June 21, 2025 13:27

adriangb mentioned this pull request Jun 21, 2025

SortQueryFuzzer found a failing case on main #16452

Open

adriangb changed the title ~~(debugging) re-enable sort fuzz test~~ re-enable sort_query_fuzzer_runner Jun 21, 2025

adriangb mentioned this pull request Jun 21, 2025

use 'lit' as the field name for literal values #16498

Merged

adriangb requested a review from alamb June 21, 2025 13:54

alamb approved these changes Jun 22, 2025

View reviewed changes

alamb merged commit 2bf8441 into apache:main Jun 22, 2025
30 checks passed

This was referenced Jun 22, 2025

Restore topk filtering tests #16501

Draft

Temporarily fix bug in dynamic top-k optimization #16465

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

re-enable `sort_query_fuzzer_runner` #16491

re-enable `sort_query_fuzzer_runner` #16491

Uh oh!

adriangb commented Jun 20, 2025 •

edited by alamb

Loading

Uh oh!

alamb commented Jun 21, 2025

Uh oh!

adriangb Jun 21, 2025

Uh oh!

adriangb commented Jun 21, 2025

Uh oh!

adriangb commented Jun 21, 2025

Uh oh!

adriangb commented Jun 21, 2025

Uh oh!

adriangb commented Jun 21, 2025

Uh oh!

AdamGS commented Jun 21, 2025

Uh oh!

alamb commented Jun 22, 2025

Uh oh!

alamb left a comment

Uh oh!

adriangb commented Jun 22, 2025

Uh oh!

alamb commented Jun 22, 2025

Uh oh!

Uh oh!

alamb commented Jun 22, 2025

Uh oh!

Uh oh!

re-enable sort_query_fuzzer_runner #16491

re-enable sort_query_fuzzer_runner #16491

Uh oh!

Conversation

adriangb commented Jun 20, 2025 • edited by alamb Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alamb commented Jun 21, 2025

Uh oh!

adriangb Jun 21, 2025

Choose a reason for hiding this comment

Uh oh!

adriangb commented Jun 21, 2025

Uh oh!

adriangb commented Jun 21, 2025

Uh oh!

adriangb commented Jun 21, 2025

Uh oh!

adriangb commented Jun 21, 2025

Uh oh!

AdamGS commented Jun 21, 2025

Uh oh!

alamb commented Jun 22, 2025

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

adriangb commented Jun 22, 2025

Uh oh!

alamb commented Jun 22, 2025

Uh oh!

Uh oh!

alamb commented Jun 22, 2025

Uh oh!

Uh oh!

re-enable `sort_query_fuzzer_runner` #16491

re-enable `sort_query_fuzzer_runner` #16491

adriangb commented Jun 20, 2025 •

edited by alamb

Loading